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High-frequency data observed on the prices of financial assets 
are commonly modeled by diffusion processes with micro-structure 
noise, and realized volatility-based methods are often used to estimate 
integrated volatility. For problems involving a large number of assets, 
the estimation objects we face are volatility matrices of large size. 
The existing volatility estimators work well for a small number of 
assets but perform poorly when the number of assets is very large. 
In fact, they are inconsistent when both the number, p, of the assets 
and the average sample size, n, of the price data on the p assets 
go to infinity. This paper proposes a new type of estimators for the 
integrated volatility matrix and establishes asymptotic theory for the 
proposed estimators in the framework that allows both n and p to 
approach to infinity. The theory shows that the proposed estimators 
achieve high convergence rates under a sparsity assumption on the 
integrated volatility matrix. The numerical studies demonstrate that 
the proposed estimators perform well for large p and complex price 
and volatility models. The proposed method is applied to real high- 
frequency financial data. 

1. Introduction. Intra-day data observed on the prices of financial as- 
sets are often referred to as high-frequency financial data. Advances in tech- 
nology make high-frequency financial data widely available nowadays for a 
host of different financial instruments on markets of all locations and at 
various scales, from individual bids to buy and sell, to the full distribu- 
tion of such bids. The wide availability, in turn, stirs up an increasing de- 
mand for better modeling and statistical inference regarding the price and 
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volatility dynamics of the assets. Diffusion processes are often employed to 
model high-frequency financial data, and various methodologies have been 
developed in past several years to estimate integrated volatility (or diffusion 
variance) over a period of time, say, a day. For a single asset, estimators of 
integrated volatility include realized volatility (RV) [Andersen et al. (2003), 
Barndorff-Nielsen and Shephard (2002)], bi-power realized variation (BPRV) 
[Barndorff-Nielsen and Shephard (2006)], two-time scale realized volatility 
(TSRV) [Zhang, Mykland and A'it-Sahalia (2005)], multiple-time scale re- 
alized volatility (MSRV) [Zhang (2006)], wavelet realized volatility (WRV) 
[Fan and Wang (2007)], kernel realized volatility (KRV) [Barndorff-Nielsen 
et al. (2008a)], pre-averaging realized volatility [Jacod et al. (2007)] and 
Fourier realized volatility (FRV) [Mancino and Sanfelici (2008)]. For mul- 
tiple assets, we encounter a so-called non-synchronization problem which 
refers to as the fact that transactions for different assets often occur at 
distinct times, and the high-frequency prices of the assets are recorded at 
mismatched time points. Hayashi and Yoshida (2005) and Zhang (2007) have 
proposed to estimate integrated co-volatility of two assets based on overlap 
intervals and previous ticks. Barndorff-Nielsen and Shephard (2004) consid- 
ered estimation of integrated co-volatility for synchronized high-frequency 
data. 

A large number of assets are usually involved with in asset pricing, portfo- 
lio allocation, and risk management. One key problem we face is to estimate 
an integrated volatility matrix of large size for the assets. The scenario fits 
in to the so-called small n and large p or large n but much larger p problem, 
a current hot topic in statistics. The existing volatility estimation methods 
work well only for the cases of a single asset or a small number of assets 
where volatility is either scalar or a small matrix. Their poor behaviors for 
a large volatility matrix are indicated by random matrix theory and large 
covariance matrix estimation. Although idealized, the following example is 
still able to illustrate the point. Consider p assets over unit time interval 
with all prices following independent Black-Scholes model with zero drift 
and unit volatility. Then the log prices obey independent standard Brow- 
nian motions, and the true integrated volatility matrix T is equal to the 
identity matrix I p . Assume that we observe all p prices without noise at 
the same time grids = £/n for £ = 0,1, . . . ,n. The corresponding returns 
are i.i.d. normal random variables with mean zero and variance 1/n. For 
this case, the existing best estimator of T is the RV, T, with the following 
representation: 

~ 1 n 

T=(T ij ), Tij = -^2 z it z jt for 1 < i, j < p 
e=i 

where Za, I = 1, . . . , n, i = 1, . . . ,p, are independent iV(0, 1) random vari- 
ables. It is widely known that T is a poor estimator of T when both n and 
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p are large [Johnstone (2001), Johnstone and Lu (2009), El Karoui (2007, 
2008), Bickel and Levina (2008a, 2008b)]. In fact, for n and p both going to 
infinity but p/n—>c, the largest eigenvalue of T asymptotically behaves like 
(1 + -y/c) 2 while all true eigenvalues are equal to 1. 

This paper develops a methodology for estimating large volatility matrices 
based on high-frequency financial data. We establish asymptotic theory for 
the proposed estimators under sparsity or decay assumptions on integrated 
volatility matrices as both n and p go to infinity. The estimators proposed 
in this paper are constructed as follows. In stage one we select grids as 
pre-sampling frequencies, construct a realized volatility matrix using previ- 
ous tick method according to each pre-sampling frequency and then take 
the average of the constructed realized volatility matrices as the stage one 
estimator, which we call the ARVM estimator. In stage two we regularize 
the ARVM estimator to yield good consistent estimators of the large inte- 
grated volatility matrix. We consider two regularizations: thresholding and 
banding, developed by Bickel and Levina (2008a, 2008b) in the context of 
covariance matrix estimation. Thresholding a matrix is to retain only the el- 
ements whose absolute values exceed a given value which is called threshold 
and to replace the others by zero. Thresholding technique was introduced in 
wavelet literature for function estimation and image analysis [Wang (2006)] 
where a function is known to have sparse representations in the sense that 
there are a relatively small number of important terms in its representa- 
tions, but neither the number nor the locations of the important terms are 
known. We use thresholding to pick up the important terms for construct- 
ing estimators. For a sparse matrix, the small number of elements with large 
values are important. We need to locate those elements and estimate their 
values. The thresholding ARVM estimator is designed to find its elements 
of large magnitude along with their locations. "Banding" a matrix is to 
keep only the elements in a band along its diagonal and replace others by 
zero. Banding is analog to smoothing in nonparametric function estimation 
where in Taylor or orthogonal expansions of a given function, the locations 
of important terms are known. We simply choose the important terms for 
building estimators. For a matrix with elements decaying away from its di- 
agonal, important terms are elements within a band along the diagonal, and 
banding ARVM estimator is used to select its elements within the band. 
The regularized ARVM estimators provide better volatility estimation that 
can greatly enhance portfolio allocation and risk management. With the 
volatility matrix estimators obtained from high-frequency data, we are able 
to investigate volatility dynamics directly and significantly improve volatil- 
ity forecasting. See Andersen, Bollerslev and Diebold (2004) and Wang, Yao 
and Zou (2008). 
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We have shown that for a sparse integrated volatility matrix, the thresh- 
olded ARVM estimator not only consistently estimates the integrated volatil- 
ity matrix but also achieves a high convergence rate when p is allowed to 
grow as fast as a power of sample size with the power depending on the num- 
ber of moments imposed on volatility processes and micro-structure noise. 
When it is known that the integrated volatility matrix has elements decaying 
away from its diagonal, the banded ARVM estimator is consistent and may 
enjoy a slightly higher convergence rate than the thresholded ARVM estima- 
tor. We have conducted extensive simulation studies for sophisticated price 
and volatility models with large p. The simulation studies demonstrate that 
the proposed estimators perform well for finite p and sample size. The pro- 
posed method is applied to high-frequency data on 630 stocks traded in the 
Shanghai Stock Exchange. 

The problem considered in this paper is much more complex than covari- 
ance matrix estimation, and our technical analyses rely on delicate treat- 
ments of diffusion processes and noise. Consequently the assumptions im- 
posed and convergence rates obtained are different from those in the co- 
variance matrix context. First, because of micro-structure noise in high- 
frequency financial data, true assets' prices are not directly observable; sec- 
ond, observations for continuous price processes are available only at dis- 
crete time points; third, price data on multiple assets are nonsynchronized; 
fourth, randomness in the observed data is caused by micro-structure noise 
as well as uncertainties in price and volatility processes. Because of noise, 
nonsynchronization and discretization, the convergence rates for volatility 
matrix estimation depend on sample size through a rate slower than the 
usual square root rate for covariance matrix estimation. Due to the multiple 
random sources in the data, for commonly used price and volatility mod- 
els, the observed data cannot be sub-Gaussian. As a result, the convergence 
rates increase in p faster for volatility matrix estimation than for covariance 
matrix estimation. 

The rest of the paper is organized as follows. Section 2 introduces the basic 
models for high-frequency data and the estimation problem. The proposed 
methodology is presented in Section 3. The asymptotic theory is established 
in Section 4. Numerical studies are reported in Section 5. All proofs are 
relegated in Sections 6 and 7. 

2. The set-up. 

2.1. Observed data. Consider p assets, and let Xi(t) be the true log price 
at time t of the ith asset, i = 1, . . . , p. Suppose that we have high-frequency 
data for which the true log price of the ith asset is observed at times tn, 
£ = l,...,m, and denote by Yi(tn) the observed log price of the ith asset at 
time tj£. Because of the nonsynchronization problem, typically tn ^ tj£ for 
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any i ^ j. The high-frequency data are usually contaminated with micro- 
structure noise in the sense that the observed log price Yi(tn) is a noisy 
version of the corresponding true log price Xi(tn). It is common to assume 

(1) Yi(tie)=Xi(tie) + ei(tu), i = l,...,p,£=l,...,rii, 

where £i(ta), i = 1, . . . ,p, I = 1,. .. , rtj are independent noises with mean 
zero, for each fixed i, £i(ta), £ = 1, . . . ,rtj are i.i.d. random variables with 
variance rji, and £j(-) and -Xj(-) are independent. 

In the realized volatility literature, it is often assumed that micro-structure 
noise is i.i.d. and independent of the underlying price process. The simplistic 
assumption is used to study the effect of micro-structure noise on the volatil- 
ity estimation. Recently Hansen and Lunde (2006) and Kalnina and Linton 
(2008), among others, have considered univariate micro-structure models 
where micro-structure noise has serial correlation and is correlated with the 
underlying price process. 

2.2. Price model. Let X(i) = (X\(t), . . . , X p (t)) T be the vector of the 
true log prices at time t of p assets. Following finance theory we assume 
that X(t) obeys a continuous-time diffusion model, 

(2) dX(t) = ^ t dt + aJdB t , ie[0,l], 

where /x t = (ni(t), . . . , ^ p {t)) T is a drift vector, B t = {B\ t , . . . ,B pt ) T is a 
standard p-dimensional Brownian motion [i.e., Ba are independent standard 
Brownian motions] and at is a p by p matrix. The volatility of X(t) is given 
by 

lif) = ilij(t))i<i,j< P = Ot*t, 
and its quadratic variation is equal to 

[X,X] t = [\(s)ds=( [ 7ij (s)ds) , te[0,l]. 

J \J0 J l<i,j<p 

Our goal is to estimate the integrated volatility matrix, 

r = {Vij)i< hj < p = f 7(i) dt=( [ 7lj (t) dt] 

Jo \Jo / l<i,j<p 

based on noisy and nonsynchronized observations Yi(ta), £ = 1, . . . ,rii, i = 
l,...,p. 
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3. Estimation methodology. 

3.1. Realized volatility matrix. Fix an integer m and take r r ,r = 1, . . . ,m, 
to be the pre-determined sampling frequency. Let r = {r r , r = 1, . . . , m}. For 
asset i, define previous-tick times 

7i )T . = m&x{tie <T r ,£ = l,.. .,rii}, r = l,...,m. 

Based on r we define realized co-volatility between assets i and j by 

m 

(3) f y (T) = X^KrO " ^Kr-OIK-C^r) " ^far-l)], 

r=l 

and realized volatility matrix by 

(4) f(T) = (P«(T)). 

The pre-determined sampling frequency r is usually selected as regular grids. 
For a fixed m, there are if = [n/m] classes of nonoverlap regular grids given 
by 

T k = {r/m,r = 1, . . . , m} + (fc — l)/n = {r/m + {k — l)/n,r = 1, . . . , m}, 

(5) 

where k = 1 , . . . , K and n is the average sample size 

v 



= - vv 

^ 1=1 

For each sampling frequency r fc , using (3) and (4) we define realized co- 
volatility Tij(r k ) between assets i and j and realized volatility matrix r(r fc ). 
Define 

(6) % 4e^-(^)' r = ft) 4e^' 

fc=l fe=l 

Like TSRV in the univariate case [Zhang, Mykland and A'it-Sahalia (2005)], 
r is the average of K realized volatility matrices T(r k ). We use T k to sub- 
sample data for computing r(r fc ) and then take their average to define Y. 
The purpose of subsampling and averaging is to handle noise and yield a 
better estimator. 

We need to adjust the diagonal elements of T to account for the noise 
variances. Set rj = diag(r/i, . . . ,rj p ) where r\i is the variance of noise 
We estimate r]i by 

(7) fc=2- E^M-^i^-i)] 3 , 
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and denote by fj = diag(%, . . . ,rj p ) the estimator of rj. Define an estimator 
of T by 

(8) f = (Y ij )=Y-2mrj, 

that is, we estimate element Ty of Y by Ty for i ^ j and Ya — 2mrji for 
i= j. The diagonal elements of Y are equal to TSRV of Zhang, Mykland 
and Ai't-Sahalia (2005). We call Y the averaging realized volatility matrix 
(ARVM) estimator. 

High-frequency financial data are usually not equally spaced nor synchro- 
nized, and thus observations may be more or less dense for some assets than 
others or in some time intervals than others. An asset may have no obser- 
vation between two consecutive time points in a sampling frequency; then 
the term involving these two consecutive time points in (3) is equal to zero, 
and from (6) we can see that the ARVM estimator automatically adjust for 
data with varying denseness. 

3.2. Regularize ARVM estimator. For small p, Y provides a good esti- 
mator for r. However, Y has a poor performance when p gets very large. It 
is well known that even for constant 7(i), when n and p both go to infinity, 
estimators like T are inconsistent. In particular, when p is very large, the 
eigenvalues and eigenvectors of Y are far from those corresponding to V [see 
Bickel and Levina (2008a, 2008b), Johnstone (2001) and Johnstone and Lu 
(2009)]. 

We need to impose some structure on T and regularize T in order to esti- 
mate r consistently. Following Bickel and Levina (2008a, 2008b) we consider 
decay or sparsity assumptions on T and regularize T with banding or thresh- 
olding as follows. 

Decay condition: We assume that the elements of T decay when moving 
away from its diagonal, 

M 

(9) |rd<7— T— T^TT' l<iJ<p,E[M]<C, 

where M is a positive random variable, and C and a are positive generic 
constants. 

Sparsity condition: We assume that Y satisfies 

v 

(10) ^ M7r (P)> i = l,..., P ,E[M]<C, 

3=1 

where M is a positive random variable, < S < 1, and n(p) is a deterministic 
function of p that grows very slowly in p. 

Examples of ir(p) include 1, logp and a small power of p. The case of 5 = 
in (10) corresponds so that each row of Y has at most Miv(p) number of non- 
zero elements. Decay condition (9) corresponds to a special case of sparsity 
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condition (10) with 6 = l/(ct + 1) and vr(p) = logp or l/(a + 1) < 5 < 1 and 
tt(p) = 1. 

The decay condition depends on the order of p assets in the log price vec- 
tor X(t). As stocks have no natural ordering, the decay condition may not 
hold for real volatility matrices of stock returns. As a result, for volatility 
matrix estimation in financial applications, sparsity is much more realistic 
than the decay assumption. Examples of sparse matrices include block di- 
agonal matrices, matrices with decay elements from diagonal, matrices with 
relatively small number of nonzero elements in each row or column and ma- 
trices obtained by randomly permuting rows and columns of above matrices. 

For r satisfying decay condition (9), its large elements are within a band 
along its diagonal, and the elements outside the band are negligible. We 
regularize T by banding, which is to keep only its elements in a band along 
its diagonal and replace others by zero. Specifically, the definition of banding 
I 1 is given by 

B 6 [f] = (fyl(|i-j|<6)), 

where b is a banding parameter, and j\ < b) is the indicator of \i — 

j\ < b}. The (i,j)th element of Bf,[T] is equal to for \i — j\ < b and zero, 
otherwise. We call £>b[r] the BARVM estimator. 

If r satisfies sparsity condition (10), its important elements are those 
whose absolute values are above a certain threshold. We regularize T by 
thresholding which is to retain its elements whose absolute values exceed a 
given value and replace others by zero. Specifically, we define the threshold- 
ing of r by 

T m \T] = (T ij l(\f ij \>m)), 

where w is threshold. The (i,j)th element of 7^7 [T] is equal to Tjj if its 
absolute value is greater or equal to w and zero, otherwise. We call 7^ [T] 
TARVM estimator. 

Like most of existing co-volatility matrix estimators, we cannot guaran- 
tee the positiveness of the ARVM estimator for finite sample. As the band- 
ing and thresholding procedures do not resolve the positiveness issue, the 
BARVM and TARVM estimators may not be positive for a finite sample. Re- 
cently Barndorff-Nielsen et al. (2008b) has developed a kernel-based method 
with refresh sampling time technique to produce a semi-positive co- volatility 
matrix estimator. The estimator is designed for fixed p and must suffer from 
the same drawback as the ARVM estimator for large p; it will be interesting 
to apply the regularization procedures to the semi-positive matrix estimator 
and investigate their asymptotic behaviors for large p and n. 

Banding is analog to smoothing in nonparametric function estimation 
where in the representations of a target function by Taylor or orthogonal 
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expansions the locations of important terms in the expansions are known. 
We simply select those important terms to keep for building estimators. 
For a matrix with decaying elements from its diagonal, important terms are 
elements within a band along the diagonal, and banding is used to select the 
elements within the band. Thresholding is utilized for estimating a function 
with sparse representations, where we know that there are a relatively small 
number of important terms in its representations, but neither the number 
nor the locations of the important terms are known. We use thresholding to 
pick up the important terms for constructing estimators. For a sparse matrix, 
all we know is that a relatively small number of the elements with large 
values essentially matter. We need to locate those elements and estimate 
their values. Thresholding is designed to find the elements of large magnitude 
and their locations. 

For the BARVM and TARVM estimators, we need to select proper val- 
ues for banding parameter b and threshold w from data. Data-dependent 
methods for selecting b and w are illustrated at the end of Section 5.3 for 
simulated data and at the beginning of Section 5.5 for real data. 



4. Asymptotic theory. First we fix some notations for the theoretical 
analysis. Given a p-dimensional vector x = (x±, . . . , x p ) T and a p by p matrix 
U = (Uij), define their ^-norms as follows: 

/ p \l/d 

\\x\\d= i^2\xi\ d \ , ||U|| d = sup{||Ux|| d ,||x|| d = 1}, d = l,2,oo. 

Then 1 1 XJT 1 1 2 is equal to the square root of the largest eigenvalue of UU T , 

p p 
||U||i = max ) \Uij\, ||U||oo = max ) \U{j\, 
i=l j=l 



and 



|U||1 < ||U||i||U| 



For symmetric U, ||U||2 is equal to its largest absolute eigenvalue, and 

||U|| 2 <||U||i = ||U||oo. 

Next we state some technical conditions. 

Al: For some j3 > 2, 

max max EU^aitW 1 < oo, max max E\\ui(t)\ 2/3 } < oo, 

l<i<pO<t<l l<i<pO<t<l 

max E[\Ei(t if >)\ 2fS ] < oo. 

l<i<p 
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A2: Each of p assets has at least one observation between and t* +1 . 
With n = (n\ + • • • + n p ) /p, we assume 

C\ < min — < max — < C2, max max \ta — tj = 0{n~ l ), 

l<i<p n l<i<P n l<i<pl<i<ni 

m = o(n). 

Theorem 1. Under models (1) and (2) and conditions Al and A2 we 
have for all 1 < i, j < p, 

(ii) £(|r\-r 4 /)<ce£, 

where C is a generic constant free of n and p, and the convergence rate e« 
given below is equal to the sum of terms with powers of n and K = [n/m] 
which depend on whether the observed data in the model specification have 
micro-structure noise or not. 

(1) If there is micro- structure noise in model (1), 

e P n = (Kn- 1 / 2 )^ + K-P' 2 + (n/K)-^ 2 + + n^' 2 . 

Thus with K ~ ra 2 / 3 we have e n ~ n _1//6 . 

(2) If there is no micro-structure noise [i.e. £i(tu) = and Yi(tn) = Xi{ta)J 
in model (1), 

ei = (n/K)-V 2 + K-P + n~PI 2 . 
Thus with K ~ ra 1 / 3 we have e n ~ n" 1 / 3 . 

Remark 1. The convergence rate e n can be attributed to three sources 
due to noise, nonsynchronization and discrete observations for continuous 
process X(i). Because of micro-structure noise in high-frequency financial 
data, true log-price X(t) is not directly observable. Furthermore, as a con- 
tinuous process, X(t) is observed with noise only at discrete time points. 
Consequently the convergence rate e n is slower than n" 1 / 2 . In fact, the opti- 
mal convergence rate for the univariate noise case is n _1//4 ; the nonsynchro- 
nization for multiple assets further complicates the problem. The terms in 
convergence rates given by Theorem 1 can be identified to associate with 
specific sources as follows. The terms (Kn~ l / 2 )~P + K~^^ 2 in are due to 
noise with + n~^l 2 contributed by nonsynchronization. Because X(t) 
is observed at discrete time points, we need to discretize X(t) and use its 
discretization to approximate integrated volatility matrix. Term (n/ K)~^/ 2 
in e^ is attributed to the approximation error due to the discretization of 
X(£). These are clearly spelled out by Propositions 1-3 in Section 7 for the 
proof of Theorem 1. The contributions of the three sources to the conver- 
gence rates for TSRV, MSRV, realized co-volatility estimators have been 
shown in Zhang, Mykland and Ait-Sahalia (2005) and Zhang (2006, 2007). 
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Theorem 2. Assume that T satisfies sparsity condition (10). Then un- 
der models (1) and (2) and conditions Al and A2, we have 

||r ro [f ] - r|| 2 < ||r ro [f ] - riu = o P (7r(p)[e n p 2 ^/i niP ] 1 - 5 ), 

where e n is given in Theorem 1, w = e n p 2 ^h n ^ p , and h np is any sequence 
converging to infinity arbitrarily slow with one example h UjP = loglog(n Ap). 

Remark 2. The convergence rate in Theorem 2 is nearly equal to 7r(jp) [c n x 
Since e n ~ n ' for the noise case and e n ~ n ' for the noiseless 
case, in order to make e n p 2 ^ go to zero, p needs to grow more slowly than 
r?/ 3 / 12 for the noise case and for the noiseless case. 

Theorem 3. Assume that T satisfies decay condition (9). Then under 
models (1) and (2) and conditions Al and A2, we have that 

\\B b [t] - T\\ 2 < \\B b [t] - r|U = Op([e n p 1/ ^] Q/(Q+1+1/ ^ ) ), 
where we select banding parameter b of order (e n p 1 /^) _1// ( Q+1+1// ^^ . 

Remark 3. For T satisfying decay condition (9), the sparsity condition 
is held with 5 = l/(a + 1) and vr(p) = logp. The convergence rate corre- 
sponding to Theorem 2 under the sparsity condition has a leading factor of 
order [e n p 2 ^] a ^ a+1 \ Comparing it with the rate in Theorem 3, we conclude 
that the two convergence rates are quite close for reasonably large j3. 

Remark 4. The convergence rates in Bickel and Levina (2008a, 2008b) 
for large covariance matrix estimation increase in matrix size p through a 
power of logp, but the convergence rates in Theorems 2 and 3 grow with 
p through a power of p. The slower convergence rates here are due to the 
intrinsic complexity of our problem. The \ogp factor in the convergence 
rates of covariance matrix estimation is attributed to Gaussianity or sub- 
Gaussianity imposed on the observed data. In our set-up, observations Yi(tu) 
from model (1) have random sources from both micro-structure noise £i(ta) 
and true log price X(t) given by model (2). The term f <r T dH s in X(t) 
does not obey sub-Gaussianity for common price and volatility models. Even 
though we assume normality on £i{tu), the observed data Yi(tn) cannot be 
sub-Gaussian for the price and volatility models. Consequently we employ 
realistic moment conditions in assumption Al, obtain convergence rates for 
the elements of T in Theorem 1 and derive subsequent convergence rates 
with a power of p for the regularized T in Theorems 2 and 3. 

Remark 5. For Gaussian observations, Cai, Zhang and Zhou (2008) 
have established optimal convergence rates for estimating a covariance ma- 
trix which is assumed to belong to a class of matrices satisfying the decay 
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condition. The convergence rate for the minimax risk based on the squared 
£2 norm is equal to the minimum of n - 2a /( 2a + 1 ) + k>gp/n and p/n. The result 
indicates that the convergence rate in Bickel and Levina (2008a) is subopti- 
mal. It is very interesting and challenging to find optimal convergence rates 
for the volatility matrix estimation problem in our set-up. 

5. Numerical studies. 

5.1. High-frequency data. The real data set for our numerical studies is 
high-frequency tick by tick price data on 630 stocks traded in the Shanghai 
Stock Exchange over 177 days in 2003. For each day, we computed the 
ARVM estimator corresponding to T defined in Section 3 where the pre- 
determined sampling frequencies were selected to correspond with 5 minute 
returns. This yielded 177 matrices of size 630 by 630 as ARVM estimators 
of integrated volatility matrices over the 177 days. The average of these 177 
matrices was then evaluated and denoted by 0. 

5.2. The simulation model. In our simulation study the true log price 
X(i) of p assets is generated from model (2) with zero drift, namely, the 
diffusion model, 

(12) dX(t) = of dB t , *e[0,l], 

where B^ = (B\t, . . . ,B p t) T is a standard p-dimensional Brownian motion, 
and we take <r t as a Cholesky decomposition of -y(t) = (7ij(i))i<«j'<p which is 
defined below. Given the diagonal elements of "f(t), we define its off-diagonal 
elements by 

(13) H 3 (t) = Mt)} li - jl ^ii(thn(t), l<i^J<P, 

where process n(t) is given by 
2u(t) _ 1 

(14) K (t) = ——r , du(t) = 0m[0.64-u{t)]dt + 0.118u{t)dW Kt , 

giM(t) _|_ ^ 

V 

(15) W K , t = V0MW^ t - 0.2 Bit/VP; 

i=l 

W® t is a standard 1-dimensional Brownian motion independent of Bf . Model 
(14) is taken from Barndorff-Nielsen and Shephard (2002, 2004). 

The diagonal elements of ~f(t) are generated from four common stochastic 
volatility models with leverage effect. The four volatility processes are geo- 
metric Ornstein-Uhlenbeck processes, the sum of two CIR processes [Cox, 
Ingersoll and Ross (1985) and Barndorff-Nielsen and Shephard (2002)], the 
volatility process in Nelson's GARCH diffusion limit model [Wang (2002)] 
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and two-factor log-linear stochastic volatility process [Huang and Tauhen 
(2005)]. Specifically, let U* = . . . , U^) T and U? = (Uf t , . . . , Vfr f be 
two independent standard p-dimensional Brownian motions which are in- 
dependent of Bt and W® t , and then define two p-dimensional Brownian 
motions, WJ = (Wl t , ■ • • , W^) T and Wf = (W? t , W^f , by 



(16) W>£ = ^ + Jl - Pplt, Wl = Pi B it + J I - ppl 



it' 

where we choose the following negative values for pi to reflect the leverage 
effect, 

-0.62, l<i<p/4, 
-0.50, p/A<i<p/2, 
-0.25, p/2 < i < 3p/A, 
-0.30, 3p/4<i<p. 
We generate 7«(i) as follows. 

(1) For 1 < i < p/A, 7ii(t) are drawn from the geometric Ornstein-Uhlenbeck 
model driving by W^ t [Barndorff-Nielsen and Shephard (2002)], 

(17) dlog7«(*) = -0.6(0.157 + log7«(t)) dt + 0.25 dW^. 

(2) For p/4 <i< p/2, ju(t) are drawn from the sum of two CIR processes 
[Barndorff-Nielsen and Shephard (2002)], 

(18) 7«(t) = 0.98(ui,t + U2, t ), 

where an d V2,t obey two CIR models driving by Wf t and Wf t , respectively, 

(19) dv lit = 0.0429(0.108 - v 1>t ) dt + 0.1539^7 dW} t , 

(20) dv2,t = 3.74(0.401 - v 2 ,t) dt + 1.4369^7 dW* t . 

(3) For p/2 < i < 3p/4, ju(t) are drawn from the volatility process in 
Nelson's GARCH diffusion limit model driving by W^ t [Barndorff-Nielsen 
and Shephard (2002)], 

(21) d lu {t) = [0.1 -7i<(t)]£ft + 0.27ft (t)dWiJ t . 

(4) For 3p/4 < i < p, ja(t) are drawn from the two- factor log-linear stochas- 
tic volatility model driving by W} t and Wf t [Huang and Tauhen (2005)], 

(22) lu (t) = e- 6 - 8753 s-exp(0.04t; M + 1.5^ - 1.2), 
where 

dv 1:t = -0.00137t^,t dt + dW} p 

(23) dv 2 ,t = -1.386«2 )f dt + (1 + 0.25u 2 ,t) dWf p 

fe u , if u<log(8.5), 

s-expw | 8 _5| 1 _ log ( 8 _ 5 ) +u 2/ log ( 8 5 )}i/2 ) if M >log(8.5). 
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With 7jj(i) generated from above stochastic differential equations, we multi- 
ply lu{t) by 1000#i where Oi are the ordered (from the largest to the smallest) 
diagonal elements of defined in Section 5.1 as the average of 177 daily 
ARVM estimators for the high-frequency data from the Shanghai market. 
The adjustment is to roughly match simulated 7n(i) with the magnitudes 
of the diagonal elements of the ARVM estimators for the stock data. 

Finally the high-frequency data Yi(ta) are simulated from model (1) with 
noise £i(ta) drawing from independent normal distributions with mean zero 
and standard deviation of three choices: 0.002\/#i, 0.127\/#i and 0.2\/#i 
which correspond to low, medium and high noise levels. The standard de- 
viation is chosen to reflect the empirical fact that relative noise level found 
in high frequency data typically ranges from 0.001% to 0.01% with 0.001% 
for individual stock and 0.01% for stock index. In our simulated example, 
the average volatility is around 1000#j, and thus the three noise standard 
deviations are translated into 0.002%, 0.004% and 0.065% of the average 
volatility or relative noise level, respectively. 

5.3. The simulation procedure. We need to simulate n values for the 
price and volatility processes at tg = £/n, £ = 1, . . . ,n. The procedure be- 
gins with the generation of matrices "f(te). First we use normalized partial 
sums of i.i.d. standard normal random variables to simulate four indepen- 
dent Brownian motions, a standard one-dimensional Brownian motion W® t 
and three standard p-dimensional Brownian motions, B^, and Uf , and 
compute W Kl t £ , W£ and according to (15) and (16). We then use the 
Euler scheme to simulate n(ti) from (14) with W K t t and 7n(i^) from (17)- 
(23) with corresponding components of and Wf . With available K{tt) 
and 7ii(^), from (13) we evaluate off-diagonal elements Jij(ti), i^j- To 
speed up the simulation of ~f(tg) = (jij(ti)), we have utilized the following 
tricks in R programming: (i) group all p diagonal elements 711 (i^), . . . , 7pp(^) 
into four vectors of dimension p/A with each vector drawing from the same 
volatility model and update each vector in the Euler scheme; (ii) calculate 
the matrix product of column vector (711 (fy), . . . ,7pp(^)) T and row vector 
(711(^)1 ■•■ ->lpp{ti)) and then take the element by element square root of 
the obtained matrix; (iii) utilize the Toeplitz matrix operation to evaluate 
K(teY~ J ; (i y ) use the- matrix operation to compute element by element mul- 
tiplication of the two matrices yielded in steps (ii) and (iii). 

We take a% t as a Cholesky decomposition of 7 (if) and compute true log- 
price X(^) by 

X te = X^_ x + [cr^_J T [B^ - B t< _J. 

Finally, data Yi(tg), i = 1, . . . ,p, I = 1, . . . , n, are obtained by adding to Xi(ti) 
normal noise e,(^) of mean zero and standard deviation 0.002-^/^, 0.127\/$i 
and 0.2^ l /~6i for the cases of low, medium and high noise levels, respectively. 
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After the simulation has generated volatility matrices ~f(te), log-price val- 
ues Xi(ti) and synchronized noisy data i^(t^), i = 1, . . . ,p, tg = £/n, i = 
1, . . . , n, we numerically evaluate integrated volatility matrix T by Y2t=i l(tz) l n i 
compute ARVM estimator T according to (8) and calculate its banding S&fT] 
and thresholding 7^ [T] as described in Section 3.2. Note that there is no 
need to store matrices *y(ti) in the programming loop of £ = 1, . .. , n, be- 
cause at each t step of the loop all we need is to record Xiitg) and Y%{ti) 
and update the partial sum $Zr=iT(*r) to the current step for the purpose 
of evaluation of T by the average of "/(t^) at the end of the loop. Doing so 
will save huge computer storage space and prevent the simulation program 
from running out of computer memory. 

We repeat the whole simulation procedure 500 times. The mean square 
error (MSE) of a matrix estimator is computed by averaging ^-norms of 
the differences between the estimator and T over 500 repetitions. We use 
the MSEs of T, 0&[T] and 75^ [T] to evaluate their performances. For estima- 
tors Bb\T] and 7^j[r], we select the values of b and w by minimizing their 
respective MSEs. 

We generate nonsynchronized data as follows. Instead of generating obser- 
vations for the processes at n time points, we simulate ~f(te), Xi(tg), Yiitg), 
i = l,...,p, at 3ro time points ti = £/ (3n), 1=1,..., 3n. Grouping together 
three consecutive time points we divide the 3n time points tg into n groups 
{t$ r - 2)i3r-i)^3r}j r = 1,. . . ,n. For the ith asset, we select one time point at 
random from each group; from the simulated 3n values of Yi(ti) we choose n 
values corresponding to the selected time points; we use the n chosen values 
to form noisy observations for asset i. The selection procedure is applied to 
p assets for obtaining data on all assets. Because of random selection, the 
obtained data are nonsynchronized and have n observations for each asset. 
As in the synchronized case, the true T is computed by X^r=i 7(*3r)/ n - But 
we use the generated nonsynchronized data to evaluate T, Bb[T] and 7^[r], 
where the values of b and w are selected as before by minimizing their re- 
spective MSEs. Again we repeat the whole simulation procedure 500 times 
and evaluate MSEs of T, B&pT] and [r] based on the 500 repetitions. 

5.4. Simulation results. In the simulations we have tried on different 
combinations of values for n and p. This section displays the simulation 
results and reports findings based on n = 200 and p = 512. Figure 1 plots 
the sample paths of n(t) corresponding to six initial values k(0) = 0.537, 
0.762, 0.905, 0.964, 0.980, 0.995. The fixed initial values were chosen to 
obtain various patterns for T. The figure shows that process K(t) is heavily 
influenced by its initial value, and its whole path stays within a narrow 
band around the initial value. Figure 2 plots the images of T corresponding 
to these initial values. The image plots indicate that the elements of T 
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(a) k(0)=0.537 (b) k(0)=0.762 




50 100 150 200 50 100 150 200 

t t 



(c) k(0)=0.905 (d) k(0)=0.964 




50 100 150 200 50 100 150 200 

t t 



Fig. 1. Sample path plots for the process under different initial values, (a)-(f) cor- 
respond to the sample paths of n(t) with re(0) = 0.537,0.762,0.905,0.964,0.980,0.995, re- 
spectively. 



geometrically decay from its diagonal with rate n(t). The significant elements 
of r fall into a band along its diagonal, and its off-diagonal elements outside 
the band are negligible. For small n(t), the decay is very fast, and the band 
is very narrow. As K,(t) increases, the decay gets slower and slower, and the 
band becomes wider and wider. As a result, T becomes less sparse and is 
more diffuse along its diagonal. As we will see later, it will be more difficult 
to estimate T. Note that T is inhomogeneous, and the band is wider at lower 
right corner than at upper left corner. This is clearly demonstrated in Figure 
2(f) for the case with k(0) = 0.995. 
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(a)rwithK(0)=0.537 



(b)rwithK(0)=0.7G2 




200 300 
column 
(c)rwith k(0>0.905 




200 300 
column 
(e) r with c(0)=D.98 
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200 300 
column 
(d)rwith ic(0)=0.964 



400 



500 




200 300 
column 
(f) r with tc(D)=0.995 




Fig. 2. Image plots of matrix V generated with different initial values for n{t). (a)-(f) 
correspond to the images of V with k(0) = 0.537, 0.762, 0.905, 0.964, 0.980, 0.995, respec- 
tively. 



The MSEs of T, 0&[T] and %a [T] are computed over combinations of six 
initial values of k(0), three noise levels and two values of K. The numerical 
results are summarized in Table 1 for the case of synchronized data. 

The simulation results indicate that for the given T, BARVM estimator 
has smaller MSE than the corresponding TARVM estimator. This is due to 
the fact that the decay pattern of T is the ideal case for banding. However, 
if we randomly permute the rows and columns of T, the resulting matrix 
no longer decays along its diagonal but retains the same sparsity. Such a 
permutation corresponds to a random shuffle of the list of stocks. For each 
realization matrix of T displayed in Figure 2, we take a random permutation 
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Table 1 

MSEs ofT, Bb[T] and T^fT] for noisy synchronized data 



k(0) 



Noise level 


Estimator 


K 


0.537 


0.762 


0.905 


0.964 


0.980 


0.995 


Low 


f 


1 


5.595 


6.039 


7.511 


9.959 


12.259 


18.270 


Low 


f 


5 


10.186 


11.80 


14.85 


18.79 


21.488 


31.85 


Low 


Bb\T] 


1 


0.663 


1.195 


3.154 


6.000 


8.018 




Low 




1 


0.845 


2.456 


4.595 


7.457 


10.928 




Low 


B b [T] 


5 


1.085 


2.008 


5.075 


10.160 


15.299 




Low 




5 


1.744 


3.077 


6.855 


12.683 


18.111 




Medium 


f 


1 


5.641 


6.097 


7.649 


10.479 


12.280 


18.398 


Medium 


f 


5 


10.229 


11.81 


14.74 


19.17 


21.851 


31.95 


Medium 


B b [f] 


1 


0.694 


1.224 


2.785 


6.442 


9.059 




Medium 




1 


0.871 


2.466 


4.101 


7.680 


12.058 




Medium 


Bb[t] 


5 


1.093 


2.022 


4.499 


10.302 


15.384 




Medium 




5 


1.757 


3.083 


7.014 


12.742 


19.083 




High 


f 


1 


5.769 


6.234 


7.717 


10.521 


12.942 


19.26 


High 


f 


5 


10.271 


11.86 


14.89 


18.85 


21.968 


33.54 


High 


B b [t) 


1 


0.723 


1.258 


3.105 


6.298 


10.094 




High 


r ro [f] 


1 


0.896 


2.429 


4.043 


8.765 


12.844 




High 


B b [T) 


5 


1.077 


2.125 


4.601 


10.042 


16.041 




High 




5 


1.628 


3.101 


6.943 


12.998 


19.194 





of its rows and columns. Figure 3 plots the images of the obtained matrices. 
The plot shows that while the significantly large elements are scattered all 
over the place and the decay patterns completely disappear, the sparsity 
remains unchanged. For such randomly permuted T, the TARVM estimator 
maintains the same performance, while the BARVM estimator performs very 
poorly. 

The simulation results show that the MSEs increase in k(0). This can 
be explained by the fact that as /-t(O) increases, T decays more slowly and 
becomes less sparse, and thus it is more difficult to estimate T. In fact, for 
k(0) = 0.995, T is so diffuse that banding and thresholding result in almost 
no reduction in MSEs. In other words, because T is not even nearly sparse, 
when applying banding and thresholding procedures to T, we select almost 
all elements in T, and the resulting BARVM and TARVM estimators are 
basically the same as T. Also it is interesting to see that the MSEs increases 
much faster in k(0) than in noise level. For the chosen range of k(0) and the 
specified noise levels, /c(t) governing the sparsity has more influence on the 
MSEs than noise. 

The simulation results suggest that for all three noise levels, the estimators 
with K = 1 have better performance than with K = 5. We have tried to 
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Ca)rwithK(0)=0.537 



(b)rwithK<p)=0.762 





200 300 
column 

(c) T with tc(0>3.905 



100 



200 300 
column 

(d) r with k(0)=0.964 



400 



500 




200 300 
column 



Fig. 3. Image plots of the matrices obtained by randomly permuting rows and columns 
of Y in Figure 2. (a)-(f) correspond to the images of randomly permuted V with 
k(0) =0.537,0.762,0.905,0.964,0.980,0.995, respectively. 



increase noise standard deviation up to 0.6\/#i and found that for noise 
standard deviation from 0.4\/#j on, the estimators with K = 1 perform worse 
than with K = 5. We have also tried on different values for K and obtained 
the similar results. Like the TSRV estimator in Zhang, Mykland and A'it- 
Sahalia (2005), the role of K is to balance subsampling and averaging in 
r for the purpose of noise reduction. Its effect is clearly demonstrated by 
asymptotic analysis as illustrated in Theorem 1 and Remark 1. However, the 
simulation results imply that it needs large noise to manifest numerically the 
benefit of choosing K greater than 1 in the construction of T. 
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Table 2 

MSEs ofT, Bb[r] and 7to[T] for noisy nonsynchronized data 



«(0) 



Noise level 


Estimator 


K 


0.537 


0.762 


0.905 


0.964 


0.980 


0.995 


Low 


f 


1 


12.86 


16.99 


27.13 


48.37 


68.77 


152.75 


Low 


B b [T] 


1 


3.305 


4.778 


8.718 


20.375 


34.29 


85.10 


Low 




1 


3.842 


5.281 


13.38 


30.29 


46.11 


121.15 


Medium 


f 


1 


12.98 


17.10 


27.15 


48.57 


69.23 


153.57 


Medium 


B b [t] 


1 


3.911 


4.834 


8.759 


22.49 


36.04 


92.70 


Medium 




1 


3.374 


4.728 


11.662 


30.53 


50.64 


123.22 


High 


f 


1 


13.16 


17.15 


27.50 


48.07 


71.44 


151.85 


High 


B b [t] 


1 


3.443 


4.776 


8.779 


21.946 


42.23 


69.27 


High 




1 


3.997 


4.902 


11.70 


29.98 


56.27 


100.13 



Table 2 displays the MSEs of F, Bb[T] and 7^[r] for noisy nonsynchro- 
nized data. The comparison of Tables 1 and 2 shows that the MSEs in Table 
2 are much larger than the corresponding ones in Table 1 for all three noise 
levels and six values of k(0) considered. The phenomenon suggests that the 
contribution in MSEs due to nonsynchronization dominates over that due 
to noise. We note particularly that even for the case of ft(0) = 0.995 where 
r is very diffuse and regularizations have little improvement on T for the 
synchronization case, regularizations in the non-synchronization case still 
improve T with sizable reduction on MSEs. Similar to the synchronized 
case, the estimators with K = 1 have smaller MSEs than with K = 5 for all 
three noise levels. Since nonsynchronization significantly inflates MSEs, it 
requires very large noise to manifest numerically the effect on MSE reduc- 
tion by using K > 1 in the construction of T. Apart from the phenomenon 
due to nonsynchronization, Table 2 exhibits the similar MSE patterns as 
Table 1. 

5.5. An application to the stock data. We applied the proposed method 
to the high-frequency stock price data from the Shanghai market. Denote 
by Tj, i = 1, . . . , 177, the daily ARVM estimators obtained in Section 5.1. 
For each of Tj, we computed its eigenvalues and collected them as a set. 
The eigenvalue sets for Tj have wide ranges, with some very big positive 
values for the largest eigenvalues and many negative values for the smallest 
eigenvalues. As stocks have no natural ordering, the decay assumption is 
not realistic for volatility matrices, and banding may not be appropriate for 
Tj. We regularized Tj by thresholding. The threshold wi applied to Tj was 
calibrated through the quantiles of the absolute entries of Tj. For a G (0, 1), 
let 

T-&i,a be the o-quantile of the absolute entries of Tj. Then we reduced 
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threshold selection to the selection of a. Define 

176 

(24) A(a)=^||f i+1 -r roi;a [f i ]|||. 

i=i 

We selected the value of a by minimizing A(a) over a G (0, 1). The threshold 
selection procedure was motivated as follows. Because the ARVM estimators 
were evaluated at the daily level where stationarity is a reasonable assump- 
tion on volatility in financial time series, we predicted one day ahead of 
the daily realized volatility matrix by the current thresholded daily real- 
ized volatility matrix with prediction performance measured by the ^-norm 
of the prediction error. Thresholds z&i a were then selected by minimizing 
A (a), the sum of the squared ^-norms of the prediction errors over 176 
pairs of consecutive days. Our calculation resulted in selecting 0.95 for a. 
We thresholded Tj by 0^,0.95, namely, for each of Tj we retained its top 
5% entries in magnitude and replaced the rest by zero. We evaluated the 
eigenvalues of T Wi 95 . Thresholding attributes to significantly narrowing 

down the eigenvalue ranges. Since many Tj had negative smallest eigenval- 
ues, we truncated the negative eigenvalues at zero and plotted in Figure 4 the 
corresponding largest eigenvalues of Tj and T UJi 95 [Tj]. The plot shows that 
the reductions of the largest eigenvalues due to thresholding are over 50% 
for many days. Since eigen based analyses like clustering analysis, principal 
component analysis and portfolio allocation are routinely applied in practice, 
the study indicates that blind applications of such analyses to large realized 
volatility matrices without regularization may end up with very misleading 
conclusions. 

6. Proofs of Theorems 2 and 3. Denote by C a generic constant whose 
value is free of n and p and may change from appearance to appearance. Op 
and op denote orders in probability as both n and p go to infinity. 

We will prove Theorem 1 in Section 7. This section assumes (11) in The- 
orem 1. We use it to establish Theorems 2 and 3 by following Bickel and 
Levina (2008a, 2008b). 

Proof of Theorem 3. Using the relationship between E 2 - and £oo- 
norms, triangle inequality, and decaying condition (10), we have 

||£ 6 [f]-r|| 2 <||£ 6 [rVr|U 

(25) 

00 2M 

(26) \\B b [T] -rlloo < max V |r«| < 2M V k^ 1 < b'", 

i<i<p ^ J ^ a 

\i-j\>b k=b+l 

(27) |[jB 6 [r] — jB 6 [T] [|oo < (26 + 1) max ify-Iyl- 

\i-j\<b 
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day 

Fig. 4. Plots of the largest eigenvalues of daily realized volatility matrices and the 
thresholded daily realized volatility matrices for the stock data from the Shanghai mar- 
ket. The solid line represents the largest eigenvalues of daily realized volatility matrices, 
and the dotted line corresponds to the largest eigenvalues of the thresholded daily realized 
volatility matrices. 



From (11) we get 



P\ max \Tij — Tij\ > d 
<\i-j\<b 



< 



J2 P(|f y -r y |>d) 

\i-j\<b 



p(26 + l) ~ Cpb4 

< -75 , max E[\Tij - Tij\ p ] < R . 

dP \i-j\<b dP 

Combining above probability inequality with (27) we obtain 



(28) 



P(\\B b [T]-B b [T]\\ 00 >d)<P( max \T i:j -T ij \> d/(2b + l] 

\\i-j\<b 



dP 



dP ' 



where the last inequality is due to the fact that for the selected b in the 
theorem, 



P 
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Collecting together (25), (26) and (28) and taking 

d = 2di6- Q ~(pe£) Q/(a/3+/m) , 

we conclude 

P(\\B b [t } - TIU > d) < P{\\B b [t\ - B b [T]\\oo > di6~ a ) 

+ p{\\B b [T]-v\\ OQ >d l b- a ) 

<^ + P{M>ad 1 /2) 

C 2E\M] 
< — 5- H > 0, as di — > co. 

This completes the proof of Theorem 3. □ 

We need the following lemmas for proving Theorem 2. 

Lemma 1. Assume that T satisfies sparse condition (10) and w is cho- 
sen as in Theorem 2. Then for any fixed a > 0, 
p 

(29) max V Ir^-linr;,-! < aw) < a 1 ' 6 Mtt( P )w 1 ^ 5 = O p (tt(p)w 1 " 5 ), 

Ki<p — ' 
" - v j=l 

P 

(30) max V*l(|r«| > aw) < a~ s Mir(p)w~ s = P (ir(p)w~ 5 ). 

Ki<p — ' 

Proof. Simple algebraic manipulation shows 

v 

max > I 7 - 1 1 ( I 1 < aw) 

p 

< (aw) 1 " 6 max ^ |r ijr - 1 5 1 (IT^ | < aw) 

1 - i - p j=i 

< a l ~ 5 w l ~ 5 M^(p) = O p {tt{p)w 1 ~ 5 ) 

which proves (29). (30) is proved as follows: 
p p 
max V l(|Ty | > aw) < max V[|r ii |/(aro)] 5 l(|r ij | > aw) 

KKp' — * Kl<v — ' 



i=i j=i 



p 

5 



< (aw) 6 max I Ti 

~ 3 = 1 

< (aw)~ 5 M7r(p) = Op{tt(p)w- 5 ). □ 
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Lemma 2. Assume that V satisfies sparse condition (10), zu is chosen 
as in Theorem 2 and (11) is held. Then 

(31) max \fij - Tij\ = P (e n p 2/p ) = o P (w), 

±<i,j<p 



(32) P\ max Vl{|fi,- -Ti,! > w/2} > ) =o(l), 

\ Ki<p — ' / 
P 

max V l(|fy| > zu, \Yij\ <zu)< 2 5 Mtt(p)zu~ s + o P (l) 

l<i<p ^ — ' 

i=i 

(33) 

= P (7r(p)zu- s ). 

PROOF. Taking d = d\p 2 /^e n , applying the Markov inequality and using 
(11) we obtain 

P( max |r y - Ttjl >d)<^ P(\Tij ~ r«| > d) < = -g -> 

as n,p — >• oo and then di — > oo. This proves (31). Using above inequality we 
have 

max Hftij ~ r ijl > W 2 } > 0^ < P^max |f tj - Tij\ > zu/2 

2PCM_2PC_ 
zu? h? ~* 

since h n # — > oo as n,p — > oo which proves (32). To show (33) we employ (30) 
and (32) to get 

v v 
max \ l(\Tij\ > zu, \Tij\ < zu) < max > 1 ( | •/ 1 > zu, < zu/2) 
i=i j'=i 



+ max y l(|f > zu,zu/2 < |Tj,-| < ro) 

P 

< max N l(|rjj — Tij\ > zu/2) 

l<i<p ~ — ' 



3=1 

P 



max Vl(|r„| >w/2) 
±<%<p * — » 
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<o p (1) + 2 5 Mtt( P )w~ s 

= O p {tt( P )zu~ s ). □ 

Proof of Theorem 2. With the relationship between l 2 - and £oo- 
norms and the triangle inequality, we have 

\\T„\r] - r|| 2 < ||r ro [f ] - t w [t\\\ 2 + \\t w [t\ - v\\ 2 
< ||7^[r] - T^triHoo + ||7^[r] - riioo. 

Lemma 1 implies 

p 

||T ro [r] - rile = max V |r^|i(|r^-| < w ) = o P ^{ P )w l - s ). 

The theorem is proved if we show that ||7^[T] — T^r] is also of order 
w 1 ~ S Tr(p) in probability. Indeed, with simple algebra and Lemmas 1 and 2 
we establish it as follows: 

v 

1 1 7^ [f] - 7^ [r] | |oo < max y \tij - ri,-|l(|fi,-| > w, \Tij\ > w) 

Ki<p * — * 

P 

+ max |fj,|l(|fjj| > w, \Tij\ < w) 

Ki<p L — 4 
P 

+ max ^2 |r*j|l(|rij| < ec, \Tij\ > w) 
±<i<P. =l 

p 

< max \Tsj — TsA max > lflrjJ > w) 

i=i 

p 

+ max \Tij\l(\Tij \ <w) 

Ki<p L — 4 

P 

+ max | Ta — Ta\ max > lflFiil > iFtA < v&) 

l<i,j<P i<i<P*-^ 

i=i 

p 

+ w max \ 1 (|r»,- 1 > eu) 

Ki<p ^ — ' 

= o P (zi7)Op(7r(p)^' 5 ) + Op(tt(p)ct 1 - 5 ) 

+ op(-n7)Op(7r(p)ro~' 5 ) + wO p (tt(p)zu~ s ) 
= O p (tt(p)w 1 - 5 ), 



26 Y. WANG AND J. ZOU 

where the orders in line five of the six equation array are from (29)-(31) 
and (33). □ 

7. Proof of Theorem 1. 

7.1. Decomposition of V defined in (6). We decompose T = (r^ ) into 
three parts with 12 terms in this subsection and show that all parts meet 
condition (11) in next four subsections. Denote by 

Y* = (Y^), . . . , Y p (r* r )) T , X r fe = (Xi(t£ p ), . . . , X p (r^.)) T , 

£ r = ( £ l( T l,r)> ■ ■ ■ > £ p( T p,r)) T 

the vectors formed by data, true log price and noise at time points prior to r r 
for all p assets. Note that since r^ r , . . . ,Tp r are not equal, these vectors are 
nonsynchronized in the sense that their coordinates are the corresponding 
processes evaluated at different time points. From (6), we have 

^ K m 

f 4SE^- Y ?-i][ Y ?- Y ^ T 

k=l r=l 
^ K m 

= ^EE^ ~ X r-1 + £ r ~ £ r-l][ X r _ X r-1 + £ r ~ £ r-l\ T 
k=l r=l 

_. K m 
k=l r=l 

+ [x£ - X.^_ 1 ][e 1 ^ - el_ l ] T + [e r - e*_*^\2L r - X^'„ 1 ] T } 

..Km 

= k E Et X * " X r-iH X r " ^r-if + G(l) + G(2) + G(3), 

k=l r=l 

where G(l), G(2), G(3) are sums involving with noise components and will 
be handled in Section 7.2, and the first term corresponds to the average 
realized volatility estimator based on noiseless nonsynchronized true log 
prices which will be decomposed further below. Since Xj? and X^_ x 
are evaluated at time points rf r and rj c r _ 1 , and condition A2 indicates 
T ir-i — T r-i < T ir — T r > we hisert synchronized true log prices X(t^) and 
X(r^_ 1 ) in between Xj? and X,| i :_ 1 and write 

Xj? — XjLj = X^ — X(r^) + X(r^) — X(r r fc _ 1 ) + X(r,. fe _ 1 ) — X^_ x . 
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Using the above expression to expand (Xj? — XJ5_i)(X* — X*_ 1 ) , we obtain 
the following decomposition of the first term on the right-hand side of (34): 

_. K m 

^EE[ X "- X -l)[ X "- X r-l] T 
fc=l r=l 

_. K m 

= ^ E Ei x ( r ") - x ^)] p^* ) - x (^-i)] T 

k=l r=l 

+ [X r fc -X(r r fc )][X*-X(r r fc )] T 
+ [X^) - X r fe _ 1 ][X(r r fe _ 1 ) - Xjuf 
+ [X r fc -X(r r fe )][X(r^ 1 )-Xt 1 ] T 
+ [X(r^ 1 )-Xt 1 ][X^-X(r r fc )] T 
(35) + [X* - X(r r fc )][X(r r fc ) - X^F 

+ [X(r r fc )-X(r^ 1 )][X^-X(r r fc )] T 
+ [X(r r fc _ x ) - XjLJtXtf) - X(r r fc _ 1 )] T 
+ [X(r r fc ) - X^)]^^) - X^f} 
= V + H(1) + --- + H(8), 

where V corresponds to the average realized volatility estimator based on 
synchronized true log prices X(r^'), and H(l), . . . , H(8) are contributions 
due to nonsynchronization in true log prices. Then from (34) and (35) we 
decompose r — T = T — 2mr\ — T into three parts with 12 terms, 

f - T = [G(l) - 2mr) + G(2) + G(3)] + [V - T] 

(36) 

+ |H(l) + ... + H(8)]. 

Propositions 1-3 in Sections 7.2-7.4 below, respectively, establish orders for 
the /3th moments of the three parts on the right-hand side of (36). Putting 
these order results together and applying the Holder inequality, we immedi- 
ately prove Theorem 1. 

7.2. Analyze Gs for the effect of micro- structure noise. Let 

G(l) = (Gtf(l)), G(2) = (Gtf (2)), G(3) = (G y (3)). 
The purpose of this subsection is to show 

Proposition 1. Under the assumptions of Theorem 1, we have for 1 < 
i,j<P, 

E[\Gij(l) - 2mrfil(i = j) + G -(2) + Gyfflf] < C[{Kn~ 1 / 2 )^ + K~^\ 
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We prove the proposition by deriving the orders for the /5th moments of 
Gij(l) — 2mrjil{i = j), Gij(2), G^(3) and 2m(ffi — 7/j) in Lemmas 3-5 below. 

Lemma 3. Under the assumptions of Theorem 1, we have for l<i,j< 

P, 

£[|C?y(l) - 2m Vi l(i=j)f] < C{Kn- l l 2 yP. 

Proof. From the definition of G(l) in (34), we have 
Gij(l) - 2mr]il(i=j) 

^ K 771 

= rY1 J2&( T i,r) ~ £ i( T i,r-l)][£j(4 r ) ~ e^T^)] - 2 m l{i = j) 
k=l r=l 

K 771 

k=l r=l 

+ etfrfr-ite^,..!) - ViHi = j) 

~ £ i( T i,r) £ j( T j,r-l) ~ £ i( T i,r-l) £ j( T j,r)} 
= — [R± + i?2 — R3 — Ri] ■ 

Note that rf r and Tj r are equal to some tn and tj£, for fixed i, £i(ta) are 
i.i.d., and for i^j, {^(tii)} and {£j(tjt)} are independent. Thus, R\ and 
i?2 are the sums of Si{ta)sj(tje), R3 is the sum of Eiitu) multiplying by the 
lagged Sj(tji), and i?4 is the sum of £j(tji) multiplying by the lagged £i(tn). 
As a result, all four Rs are martingales. We apply the Burkholder inequality 
[Chow and Teicher (1997), Section 11.2] to Rs and use the moment condition 
on £i(tj£) in condition Al to obtain 

E[\G ij (l)-2mr H l(i = j)f] 

K m 

<C^(^/ 2 - 1 ^^i?{| £i (r^(rj; r )-r / a(i=i)| /3 
k=l r=l 

+ |ei(T&._ 1 )ej(7£ r _ 1 ) - r&l(* = i)|^ 
+ |ei(r iir )e i (T J - r _i)| /3 

+ |e i (T i)r _i)e i (r J> )|^} 

< CK^{Kmf/ 2 {E[\e i {t i>1 )s j {t jyl ) - Vi l(i = 

+ E[\e i (t i , 1 )f]E[\e j (t j , 1 )\^ 

< C(m/K f' 2 < CiKn- 1 ' 2 )-? 
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which proves the lemma. □ 

Lemma 4. Under the assumptions of Theorem 1, we have for 1 < i,j < 

V, 

E\\G l0 {2)f\ < CK-PI 2 , £[|Gy(3)|"] < CK-^ 2 . 

Proof. Because of similarity, we provide arguments only for the first 
result. Simple algebra shows that 

A m 

k=l r=l 
A m 

( 3? ) =EEW)- I '('{M)fe(i) 

fc=l r=l 
A m 



fc=l r=l 
X m-1 

X] E [ 2Xi ( T ^r) - ^i( T i>-l) - ^i( T i>+l)] £ j( T .7> 
fe=l r=l 

A' 



' j,mJ 

k=l 



fc=i 

= i?5 + i?6 — ^?7- 

Conditional on the whole path of A^, i?5, i?6 an d R7, all are sums of inde- 
pendent random variables £j(tji). Thus 

E[\R 5 f\X] <C{Kmfl 2 - 1 

A m-1 

X £ E |2^(7jr)-^(^r-l)-^(^rH.l)l /, *i(^r)| / '] 
fc=l r=l 

A m-1 



fc=l r=l 



Taking expectation in above inequality we get 

A m-1 

E[\Rs\P] < C{Kmfl 2 ~ l E E E \ 2X Mr) ~ Urtr-l) ~ U^ r+ l)f 



k=l r=l 



30 Y. WANG AND J. ZOU 

K m— 1 ~i-k 



<C{Kmf /2 - l ^^E f Ir (Ti(s)dB s - f l ' r+1 ai(s) dB 

k = l r = l l ' T i,r-l ^ T i.r 

K m—1 „ T k ,. T k 

(38) < C{Kmfl 2 - 1 Y, E E / ^ ^ s ) ds + / l ' r+1 Jn(s)ds 

7„_1 „_1 JtK 1 JtK 



k=l r=l 
if m-1 



i.r — 1 i.r 



< Cf&f" 1 ^ m"^ 2 max ^(s)!^ 2 

fe=l r=l O^ 5 ^ 1 

<C(Km)^ 2 m-^ 2 = C^/ 2 , 

where the third inequality is due to an application of the Burkholder in- 
equality [He, Wang and Yan (1992), Section 10.5 and Jacod and Shiryaev 
(2003), Section 7.3] to the stochastic integrals. 
Similarly, we have 

K 

k=l 
K 



< OK?! 2 - 1 \ X i(T?, m ) ~ Urtm-l)?, 
k=l 
K 

(39) E[\Ref) < CK^Ymii^J ~ < CK^m^' 2 , 

k=l 

E[\R 7 f\X] < CK?' 2 - 1 Y " Urt^EWe^f] 



k=l 

K 



<CK^Y\ X i(rt,x)-Mrt,o)f, 

k=l 

(40) E[\R 7 f] < CK^ 2 mT p/2 . 

Collecting (37)-(40) together, we prove the result for Gy(2). □ 

Lemma 5. Under the assumptions of Theorem 1, we have for 1 <i <p, 
E[\m{rji- m )f]<C{Kn- l / 2 r^ 

Proof. Taking K = 1 in the proofs of Lemmas 3 and 4, we have that 
conditional on the whole path of Xi(t), 
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E[\Vi~Vif] < Cnf[ Y,E{[Xi(tii) -X^-i)] 2 ?} + 



<Cn^[Y,Cn^ + n ^ 2 )<Cn 



-0/2 



which immediately shows the lemma as n = mK. □ 

7.3. Analyze V for average realized volatility based on synchronized true 
log price. Let 

m 
r=l 

(41) 

[X ) X]W = ([X i ,X i ]W) ) 

where [Xi,Xj]^ is realized co- volatility between Xi(t) and Xj(t) based on 
true log prices at the same grid times , r = 1, . .. , m, and V is equal to 
the average of K realized volatility matrices based on synchronized true log 
prices X(r^), r = 1, . . . ,m. Then from (35), we have that 

(42) V = (^-) = ^E[X,X]( fe ). 



fc=i 



Proposition 2. Under the assumptions of Theorem 1, we have for 1 < 
i, 3 <P, 

E(\V i3 -T l:j f)<Cm-^ 2 . 



Proof. Note that 



\V<--T--\ P 



k=l 
K 



k=l 
K 



<^El^i,^] (fc) -r, 



k=l 



Lemma 6 below gives the moment inequality for each pQ, Xj]( k \ from which 
we immediately show 

1 K 



k=l 
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□ 



Lemma 6. Under the assumptions of Theorem 1, we have that for 1 < 
k< K and 1 < i, j < p, 

E(\[Xi,Xj]to - Ti^KCm-V 2 . 

Proof. Note that [X,X]( fc ) is defined by X(r^) for r = 1, . . . ,m while 
k is fixed. For fixed k, r*, r = 1, . . . ,m, are equally spaced grids on [0, 1]. 
As k is fixed throughout the proof of the lemma, to simplify notation we 
write as r r , r = 1, . . . , m, in the rest of the proof. The effects of diffusion 
drifts on realized volatility are in high and negligible orders. For simplicity, 
we assume \x t = 0. Observe that 

[X^X^-T^ 

= yVXi(r r ) - Xiirr-x^lX^Tr) - X^Tr-x)} - / lij{t)dt 
r=l Jo 

(43) 

= jN I r (visfdBs f 1 (a Js fdB s - f " -fij(t)dt\ 

r= l Wr r _i Jt t -\ Jr r -i ) 

in 

— ^ ^ {Dir ("Tr ~)Djr (^r) [-^ir j -^j r ] T r } j 
r=l 

where <Tj s = (<rij jS , . . . , a p i :S ) T is the ith column of <x s , 



7ij( s ) = (<Tis) T (7js = y^ J Q~ai,sO~aj,s, 



a=l 



D ir (t) = Xi(r r ) - Xi(r r -i) = (a is ) T dB s , t€[r r -x,r r ) 

J T r _l 

and Di(t) = Di r (t) for t 6 [r r _i,r r ). Applying Ito's lemma, we obtain the 
following stochastic integral expression: 

(44) D ir {T r )D jr {T r ) - [D ir ,D jr ] Tr = / {D ir {s)a is + Dj r (s)<7j S } T dB s 
which has quadratic variation 

(45) I " {Dlishuis) + D] r (s) lj3 (s) + 2D ir {s)D jr (s) lij {s)} ds. 
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Thus from (43)-(45) we immediately show that [Xi, Xj]W — Tij has quadratic 
variation 

{I%(s)iu(8) + I%(8)v j (s) + 2D i (s)D j (s)i ij (s)}ds 

(46) 

<2^ {DUshn(s) + D](s) 7n ( S )}ds, 

where the Cauchy-Schwarz inequality is employed to show that |7ij(s)| 2 < 
lu( s )7jj( s ), an d hence 

2\D l {s)D ] {s) ll3 {s)\<D%s) lii {s) + D]{s) 1]3 {s). 

Applying the Burkholder inequality [He, Wang and Yan (1992), Section 
10.5 and Jacod and Shiryaev (2003), Section 7.3] to the stochastic integral 
expression of [Xi,Xj]^ — given by (43)-(44) and using (44)-(46), we 
have 

f ri m 

(47) \ J ° 

<C I E{\DKs)lii(s) + D%s) l3] is)fl 2 }ds. 
Jo 

On the other hand, simple algebraic manipulations derive the inequalities 

\DHshu(s) + D%s) ljj {s)f' 2 < 2^-\\DKsHi{s)f" 

(48) 



(49) E{\Df(s) 7ii (s)f/ 2 } < yjEUDiWmEihMlP}, 

and we apply the Burkholder inequality [He, Wang and Yan (1992), Section 
10.5 and Jacod and Shiryaev (2003), Section 7.3] to the stochastic integral 
Di(s), and obtain 

13- 



E{\Di{s)\ ip }<C max E 

l<r<m 



T r -1 



(50) < C max (r r - r r -if max E{\ju(s)f} 

l<r<m 0<s<l 

KCm-P max ^{| 7ii ( 5 )|^}. 

0<s<l 

Collecting together (47)-(50) we arrive at 
£{|[X,,X#)-r,/}<Cm-^ 

0<s<l 

This completes the proof of the lemma. □ 
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7.4. Analyze Hs for the effect of nonsynchronization. From (35) we know 
that H(l) = (iTjj(l)), . . . , H(8) = (fljj(8)) are attributed to nonsynchroniza- 
tion in true log prices. 

Proposition 3. Under the assumptions of Theorem 1, we have for 1 < 
*, 3<P, 

E{\H l3 (l)f + ■■■ + IHijWf} < C{(m/nf + n"^ 2 }. 

Proof. Because of similarity we provide arguments only for Hij(l), 
that is, to show 



Since 



where 



we have 



< C{{m/nf + n~PI 2 }. 

%(i) = iX>g(i), 



k=l 



m ,. T k T k 

H U 1 )=J2 / ' dx ^ s ) / ' dx ^ 

„ i J r k Jr k 



1 K 



k=l 

So we need to show for 1 < k < K, 



(51) E{\H^l)f}<C{(m/nf+n^/ 2 }. 



ij 

Let 



(52) 



flg(l) = / " dJQ(s) / " ^-(a) - / " 7 ijrf»- 



Note that 

7ij 00 



III /^T rL 

r=l i,r 



(53) 



r=l ^ 



The definition of r 4 fc r and assumption A2 show 

(54) < h% = r r k - r£. V rj; r < Cn- 1 . 
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Prom the relationship between H k Al) and H k Al) and using (53) and (54), 
we prove (51) as follows: 

m r \ 



E{\H k Al)\f 3 }<CE< 



r=l Jo 



+cE{\m(i)f} 



< Cn~ p -nt 



■'E 

r=l 



max E{\ 7ij (s)\P} + CE([Jff{ (1), &*{l)f 2 ) 

0<s<l J J 



< C(mjnf + Cn~PI 2 = C{{m/nf + n"^ 2 }, 

where for the second inequality we have applied the Burkholder inequality 
[He, Wang and Yan (1992), Section 10.5 and Jacod and Shiryaev (2003), 
Section 7.3] to E{\H^(l)f}, and the third inequality is due to Lemma 7 
below. □ 

Lemma 7. Under the assumptions of Theorem 1, we have for 1 < k < K 
and 1 < i, j <p, 

Proof. With defined in (54), the same argument for deriving (46) 
shows 

= E / " {A 2 ,r(«)7«(«) + DlMlnis) + 21),A*)l)j,-, i \*)} ds 

m r \ 



pi 

V / h%{Dl r (uh% + r k T ) li{ {uh% + r r fc ) 

r=l J ° 



+ D\ r {uh% + T k r ) lj3 {uh% + r r fc ) 

+ 2D i , r (uh , ? j + T^)Dj^ij(uh^ + r k )} du 



pi m 

J ° r =l 



+ D 2 jr {uh k i:j + rj'hjjuh-j + r r fe )} du, 

where the last inequality is due to (54) and the facts that \ jij\ 2 < Ju'Jjj, and 
thus 



2\D ijr (s)D jtr (s)j ij (s)\ < Dl r (s) lu (s) + Dl r (s) 7jj (s). 
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Hence, taking /3/2 power and then expectation on both sides of the inequality 
for [H^(l), H-°Al)], we prove the lemma as follows: 

E([E* (1)^(1)]^) 

m 

r=l ~ S - 
m 

< Cn-WmP^ max : {^(^tol^ + l^rWTiiWI^} 

* — '0<s<l J 
r=l 

where the last equality is due to the fact, which was proved in (49)-(50), 
E{\Dl r (s)ln{s)f 12 } < yl EDf r { 8 )E^{s) < Cm^' 2 rzaxE^s)}. Q 
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